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Long-range PCR remains a flexible, fast, efficient and cost-effective choice for sequencing candidate 
genomic regions in a small number of samples, especially when combined with next-generation sequencing 
(NGS) platforms. Several long-range DNA polymerases are advertised as being able to amplify up to 15 kb 
or longer genomic DNA. However, their real-world performance characteristics and their suitability for 
NGS remain unclear. We evaluated six long-range DNA polymerases (Invitrogen SequalPrep, Invitrogen 
AccuPrime, TaKaRa PrimeSTAR GXL, TaKaRa LA Taq Hot Start, KAPA Long Range HotStart and 
QIAGEN LongRange PCR Polymerase) to amplify three amplicons, with sizes of 12.9 kb, 9.7 kb, and 5.8 kb, 
respectively. Subsequently, we used the PrimeSTAR enzyme to amplify entire BRCA1 (83.2 kb) and BRCA2 
(84.2 kb) genes from nine subjects and sequenced them on an Illumina MiSeq sequencer. We found that the 
TaKaRa PrimeSTAR GXL DNA polymerase can amplify almost all amplicons with different sizes and Tm 
values under identical PCR conditions. Other enzymes require alteration of PCR conditions to obtain 
optimal performance. From the MiSeq run, we identified multiple intronic and exonic single-nucleotide 
variations (SNVs), including one mutation (c.5946delT in BRCA2) in a positive control. Our study provided 
useful results for sequencing research focused on large genomic regions. 



Since its inception, the polymerase chain reaction (PCR) has become one of the most indispensible tools in 
molecular biology to clone small DNA fragments 1,2 . However, traditionally PCR reactions were limited by 
the maximum size of amplified fragments. In 1992, Barnes 3 developed new PCR conditions to allow for 
amplification of up to 5 kb. Long-range PCR increased the size of amplicons from 3-5 kb to over 30 kb by 
modifying the polymerases. These technical advances have brought the speed and simplicity of PCR to genomic 
mapping and sequencing, and have facilitated studies in molecular genetics 4,5 . When combined with sequencing, 
long-range PCR can achieve higher sensitivity and provide a faster and more cost effective tool for detecting 
genetic variations 6,7 . 

Multiple long-range DNA polymerases are commercially available to amplify long genomic fragments. Some of 
them are advertised as being able to amplify up to 15 kb or longer genomic DNA and can work well for specific 
genomic regions under highly optimized conditions. However, little is known in literature (except manufacturer's 
flyer) on the advantage and disadvantages of each enzyme, and we are not sure about their real-world perform- 
ance on randomly chosen amplicons. Since many next generation sequencing (NGS) experiments can benefit 
from long-range PCR, knowing the different characteristics of enzymes will have a significant impact on selecting 
enzymes and optimizing experimental conditions. Therefore, we compared six long-range DNA polymerases and 
attempted to amplify three amplicons with various sizes, to identify enzymes that have good performance with 
minimal requirements for condition optimization. Subsequently, we chose one enzyme to amplify the entire 
BRCA1 and BRCA2 genes (including introns and exons) for sequencing to further evaluate its performance for 
NGS. 

A new generation of personal genome sequencers, such as the Illumina MiSeq and Ion Torrent PGM, are 
becoming popular in research and clinical settings. These sequencers have lower throughput and higher per-base- 
cost than Illumina HiSeq or Ion Proton, but their versatility and flexibility made them ideal for small labs where 
investigators prefer fast turn-around time. For example, the MiSeq sequencer allows assembly of small genomes 
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or detection of variants in candidate regions with high accuracy, and 
the latest model can now generate 2 X 300 paired-end reads and up 
to 15 Gb of data in a single run. A previous study has successfully 
used long-range PCR to sequence BRCA1 and BRCA2 by Illumina 
Genome Analyzer II, and they have only tested one enzyme, the 
Invitrogen' s SequalPrep 8 . In the current study, we selected Illumina 
MiSeq sequencer to determine if the combined use of long-range 
PCR and MiSeq can work well to identify exonic and intronic muta- 
tions in two important genes known to confer susceptibility to breast 
cancer. 

Methods 

Enzymes and Amplicons. We evaluated six commercially available long-range 
enzymes including SequalPrep polymerase (Invitrogen, Carlsbad, CA), AccuPrime 
Taq DNA Polymerase (Invitrogen, Carlsbad, CA), PrimeSTAR GXL polymerase 
(TaKaRa Bio, Shiga, Japan), LA Taq Hot Start Version Polymerase (TaKaRa Bio, 
Osaka, Japan), KAPA long Range HotStart DNA polymerase {KAPA Biosystems, 
Wobum, MA) and QIAGEN LongRange PCR Polymerase (Hilden, Germany). These 
enzymes were selected based on our knowledge at the time of the experiments, and 
based on Internet search. However, this is not a comprehensive list, and we 
acknowledge that other similar enzymes are also commercially available, such as New 
England Biolabs Phusion HF Polymerase and LongAmp Taq DNA Polymerase, 
Roche Expand Long Range DNA Polymerase, etc. Readers should not assume that the 
six enzymes used in the current study to be superior than those not included here. 

Three amplicons were selected as the targets for comparing six long-range PCR 
enzymes, due to their variable lengths and variable Tm values for primers. The PCR 
primers of Brcal.l, 1.6 and 2.8 were synthesized by Integrated DNA technologies 
(Coralville, IA). The three PCR amplicons have sizes of 12.9 kb, 9.7 kb and 5.8 kb, 
and Tm values are 54°Q 63.3°C and 54.5°C, respectively {Table 1). 

After comparing these six long-range PCR enzymes, we used PrimeSTAR to 
amplify all amplicons for the entire BRCA1/2 genes. Seventeen pairs of primers were 
synthesized by Integrated DNA technologies (Coralville, IA), where nine covered 



BRCA1 and eight covered BRCA2, with sizes ranging from 5.8 kb to 13.6 kb (Table 1). 
Most of the primers were taken from Ozcelik et al 8 , and three pairs of primers were 
designed by Primer3 9 . 

Reaction mixture and PCR conditions. To evaluate the performance of different 
enzymes, we tested each enzyme to amplify DNA samples from de-identified human 
subjects. The study was reviewed and approved by the Institutional Review Board of 
the University of Southern California (#HS-14-00425). Each of six long-range PCR 
enzymes was used to amplify three amplicons using the same genomic DNA sample 
as the template. Because the amplification protocols of long-range PCR enzymes were 
different for different enzymes, all experiments were designed according to the 
reaction mixture and cycling conditions on the manual of the corresponding 
enzymes, and we also optimized PCR conditions according to the preliminary results 
for each enzyme. Reactions were performed using the Eppendorf Master Cycler 
(Hamburg, Germany). To measure the success of a long-range PCR amplification, the 
final PCR product was run on 0.8% agarose gel and visualized by staining with 
GelGreen Nucleic Acid Stain (Biotium, Hayward, CA). These amplicons were 
generated using the reaction mixture and PCR conditions listed in Table 2. 

We further used the 2-step PCR condition of PrimeSTAR to amplify all amplicons 
in the BRCAl/2 genes. We found that the Brca 1.9 amplicon was difficult to amplify 
after re-designing multiple pairs of primers, possibly due to the presence of secondary 
structures during PCR amplification. After we added 0.4 uL dimethyl sulfoxide 
(DMSO) to 20 uL mixture reaction to interfere with the self-complementarity, the 
amplicon can be successfully amplified multiple times. All the other primers can be 
amplified using the standard 2-step protocol of PrimeSTAR. 

Library preparation and NGS for PCR amplicons. We purified all the amplicons 
using the Agencourt AMPure XP PCR Purification systems (Beckman Coulter, 
Pasadena, CA) and quantified the starting DNA library using the Qubit dsDNA BR 
Assay system (Invitrogen, Carlsbad, CA). The sequencing library construction was 
performed according to the Nextera XT sample preparation guide (Illumina, San 
Diego, CA) that uses transposome to fragment and simultaneously adds adapter and 
barcoding sequences. 

The pooled and barcoded libraries were subsequently sequenced using the MiSeq 
sequencer with v2 kits, which generated 250-base paired-end sequence reads. 



Table 1 | Primer Sequences for the BRCA1 and BRCA2 Genomic Region 

Primer ID Genomic sequence (GRCh37/hgl 9) Primer Sequence Tm ( C C) Length of Amplicon (bp) 



Brcal.l* 


Chr 


7: 


4 


1 94339-4 


207207 


5 


-ACCCCAACATTGATTCCTTTC-3 ' 


53 


2 


12896 
















5 


-CAC AGGGAGAAAGTCTGC AAG-3 ' 


55 


7 




Brcal 


2 


Chr 


7: 


4 


207139-4 


215625 


5 


-GGGAGCTGAGAAAGCAGCCAGC-3 ' 


62 


9 


8487 
















5 


-TCGGCAGGAATCC ATGTGCAGC-3 ' 


62 


4 




Brcal 


3 


Chr 


7: 


4 


2 1 5424-4 


229078 


5 


-AGCAGAAGAACGTGCTCTTTTCACGG-3 ' 


61 


2 


13655 
















5 


-ACAGTCTTC AATGTGGAGGCAGTAGGG-3 ' 


61 


8 




Brcal 


4 


Chr 


7: 


4 


227538-4 


239085 


5 


-CTGGATTGAAGATGGGTGAGA-3 ' 


54 


1 


1 1548 
















5 


-TTTCCTGTACCTTGCCAAC AC-3 ' 


55 


5 




Brcal 


5 


Chr 


7: 


4 


238861-4 


251840 


5 


-GGCAATCCTGAAGAAGTGGA-3 ' 


54 


8 


12980 
















5 


-ACAAAGC AGCGGATACAACC-3 ' 


55 


8 




Brca 1.6" 


Chr 


7: 


4 


246132-4 


255853 


5 


-GGGGAGGCTTGCCTTCTTCCG-3 ' 


62 


9 


9722 
















5 


-CTGTGCCCGGCCGGTAAAACC-3 ' 


63 


6 




Brcal 


7 


Chr 


7: 


4 


255072-4 


264003 


5 


-GCC ATGGCACCCAGCTGAAGTA-3 ' 


62 


1 


8932 
















5 


-CTGGGAGCGATACCCCC ATGCT-3 ' 


63 


4 




Brcal 


8 


Chr 


7: 


4 


258624-4 


271477 


5 


-GCCATGAAAAGATAATCTCAC AACTGC-3 ' 


56 


6 


12854 
















5 


-GGTGGCTCTGCTTATATACAC AACTGG-3 ' 


59 


0 




Brcal 


9 


Chr 


7: 


4 


2701 16-4 


281 1 12 


5 


-GAAAGGTTTCACTGAGGTGAGACTA-3 ' 


56 


2 


10997 
















5 


-ACAAGTTAGCTTTTCCTCCACATC-3 ' 


55 


3 




Brca2 


1 


Chr 


3: 


32888055-32900396 


5 


-CTCCCCCACAAAAAGGGGACAAAGC-3 ' 


62 


3 


12342 
















5 


-ACAAACTCCCACATACCACTGGGGG-3 ' 


62 


6 




Brca2 


2 


Chr 


3: 


32900267-3291 1248 


5 


-CACCACAAAGAGATAAGTCAGGTAT-3 ' 


54 


3 


10982 
















5 


-TCGTTTACACAAGTCAAGTCTG-3 ' 


52 


8 




Brca2 


3 


Chrl 3 


32910302-32923088 


5 


-GCCACTGTGCCCAAACACTACC-3 ' 


60 


9 


12787 
















5 


-TGTGCCTGGCCTC AATTC ACCA-3 ' 


61 


6 




Brca2 


4 


Chrl 3 


32922508-32935813 


5 


-TGACCCACAGTAAGGCACATC-3 ' 


57 


0 


13306 
















5 


-GCCCTCTTCTACCATTTGTGC-3 ' 


56 


1 




Brca2 


5 


Chrl 3 


32933399-32945178 


5 


-GGCCTTATGGTAGATTCCTCCCCCG3 ' 


62 


5 


1 1780 
















5 


-TGGGCCTCCACATATTTTGCTGCTT-3 ' 


61 


3 




Brca2 


6 


Chrl 3 


32945170-32958766 


5 


-GGAGGCCC AAC AAAAGAGAC-3 ' 


56 


1 


13597 
















5 


-GTGGTTTAGCCGGACTCCTC-3 ' 


57 


6 




Brca2 


7 


Chrl 3: 


32958373-32969763 


5 


-GGCAACTGTACAGGCAGACA-3 ' 


57 


5 


1 1391 
















5 


-GTCTGGGTTCTGCATTCGAT-3 ' 


55 


0 




Brca2.8* 


Chrl 3: 


32969199-32974976 


5 


-TTAATTGCCCATGAACCTCAG-3 ' 


53 


0 


5778 
















5 


-TATTGACTTGTATTGTGTTCGCTGT-3 ' 


54 


9 





"The three amplicons selected for comparing six long-range PCR enzymes. 
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Table 2 The protocol of the long-range PCR reactions for different enzymes 


Enzymes 


The composition of reaction mixture 


PCR conditions 


C ID 

oecjualrrep 


/U ncj template UINA, z. ^/L or o z. . 3 ^imol/L primers, z. y./L IUX Keaction Durrer, 


o a n r~ o 

z. minutes 




1 ^(L rV^I\ cl 1 r IU IlLtl , \J JJ.L Ly/viO^*, U.JU ^(L rUiy 1 1 lei (Joe UIIU iltllllZltU UlolMItU 


1 (I //^ 1 ^ c 

1 \J L,yL,lco 




WUIel UU IU Z.U f.lL 


0/1'' 1(1 cQ/~(~\nnc 




Tm C I 0 f^S AO nr A^°C) QPrnnrtc 






AR°r 1 fl/n minntpQ 






/ S ^X/r'IfiiC 

z. j tytito 






Qzl 0 ^ 10 cornnrJc 

y^t ^_ i \j 5econa5 






Tm S° f^S AO nr A^°ri ?0 ^Prnnrl^ 

1 111 J 1 JJ ; \J\J Ul UJ 0\J btCUIIUo 






68 C 1 0/1 3 minutes plus 20 seconds 






72 C 5 minutes 






hold at A°C 

1 IUIU LI 1 *-r *w 


A/~i-i iPrimo 

AAL.tUl 1 Milt 


70 nn tpmnlntp DNA T 9 n\ nf 9 S y/mnl/l nrimprs 9 //I 1 Ox Rnffpr 11 0 1 //I 

/ \J 1 ILJ 1 1 1 1 1 IJIL1 1 1 L/l N AA, \_) . Z. UL Ul Z. .U /(IIILJI/LLJIIIIItl^Z. it L 1 U /S UU 1 1 1 1 II , W . 1 It L 


Qzl Q (™ TO QPrnnn< 




polymerase, and distilled water to 20 //L. 


30 cycles 






0A°C TO cornnrJc 






AO C 90 sprnnds 






AR c r^ 1 T minntp^ 

f LI \- IU 1 1 1 1 1 1 U 1 to 






noia ot ^+ 


PrimpSTAR 
nil i icj i /ai\ 


?S nn tpmnlntp DKIA ? 9 j/I nf 9 S j/mnl/l nrimpr<t zl //I hiiffpr 1 A //I rlKITP 
OJ I1U 1 1 1 1 1 UIU 1 1 L/1 NrA, O . Z. f.lL \J\ Z. . U ^f[T10l/LpillTlcl5 / £ t ^(L JA UUMcl, I.U ^(L UIN 1 r, 


Ow L,yL,lfc;5 




0.4 /A Polymerase, and distilled water to 20 fiL. 


Oft (" lO carnnrJe 

7u i u seconas 






68 C 1 0 minutes 






HnlH ntzl°C 

nUIU Ul Z4. v^. 


LA Taq 


?S nn tpmnlntp DNA 4 u\ nf 9 S //mnl/l nrimprs 9 n\ 1 Ox Rnffpr II 3 9 u\ HNTP 

J^f 1 ILJ 1 1 1 1 1 IJIL1 1 1 LSI N AA, *4 ( (IL Ul Z. . 11 1 1 1 Ul/ L kJI 1 1 1 It 1 O, Z_ LIL 1 \J s\ L>U II 1 1 II, w . Z. UL Ul N 1 1 , 


Qzl 0 (™ 1 mini itp 

7 *+ \_. 1 [ 1 1 1 [ 1 u IC 




0.2 jtiL polymerase, and distilled water to 20 //L. 


30 cycles 






0R°(" 10 «cir-nnrlc 






A H r 1 s m mi i to c 

uo ^ i z. nimuicb 






72 C 1 0 minutes 






noiu ot ^+ v^, 


KAPA 


?S nn tpmnlntp HKIA ft //I nf 9 S //mnl/l nrimprc 9 i/l Rnffpr On j/I rJklTP 
OJ 1 ILJ It 1 1 1 UIUI t l_7l NAA, O ^(L Ul Z.J UII1UI/ L Ul IIIlCI Oy Z. f(L JA DUHci, U.U ^.(LUINlr, 


O^n °i TO ca^~nnnc 


5-1 8 kb 


1 zl n\ V t m ftA nnnrU (1 / i * 1 nnk/morn col V S 11/ nil nnrinicti Ion \A//~itor tn 9(1 n 

1 - *-t LIL. Z. U IIHVl IVlLJL.12/ U.Z. /.IL UUI y 1 1 ICI UOC IZ. . U U/ /"-J, LMItl LMOlllltLJ VVLIItl IU Z. V tlL 


\ S f~\/r*loc 

J J L.VL.IC3 






QA C C* 90 ^pmnrK 






60 C 1 5 seconds 






Am^ 1 / m i n 1 1 to c 

uo ^ i z. niinuico 






72 C 1 0 minutes 






noiu ot ^+ 


OIAGFN 






10 1-10 kbl 


35 nn tpmnlntp DNA 6 4 i/L nf 2 5 //mol/L nrimers 2 «L Buffer 1 /iLdNTP 0 2 /;L 

\J*J IIM 1 1 1 1 1 LJ 1 LI 1 1 L/ 1 NrA, f . *■+ f f L Ul Z. . UIIIUI/ L Ul llllwl dj Z. UL UUI ICI ^ 1 jLtLLMNII^\y.Z_ UL 


Q4 Q t t minntp^ 

7 n ' U Mill IUICj 




enzyme, and distilled water to 20 fiL. 


35 cycles 






70 L^, IJ otL,UIIU5 






62 C 30 seconds 






AH i 10 mini itp^ 

U LI \- 1 \J 1 1 1 1 1 ] U 1 1 0 






Holrl nt AT 


(> 1 0 khl 


70 nn tpmnlntp HKIA n A n\ nf 9 S j/mnl/l nrimprc 9 //I Rnffpr 1 n\ nT\ITP 
/ \J llU ItlllUIUItL'l NAA, (J.^t ^iL Ul Z. . J UlllUI/ L Ul 1 lllct b, Z. ^(LDUIIcI, 1 UL Ul N 1 1 f 






0.2 ^/L enzyme, and distilled water to 20 


1 0 cycles 






93 C C 15 seconds 






62 C C 30 seconds 






68 C 1 3 minutes 






28 cycles 






93 °C 15 seconds 






62 C 30 seconds 






68 r C 1 3 minutes plus 20 seconds 






Hold at4=C 



Sequencing data analysis. The sequencing data analysis including quality control, 
mapping and variant calling was streamlined by SeqMule 10 , which consists of popular 
third party tools and then we used the wANNOVAR web server 11 to annotate all the 
detected mutations. 

First of all, sequencing data was evaluated with FastQC 12 . Short reads were aligned 
to reference genome (hgl9) by BWA-MEM (version 0.7.4-r385) 11 algorithm with 
default settings. Then we followed the GATK {Genome Analysis ToolKit) best 
practice to identify variants. GATK (version 2.8-l-g932cd3a) 14 was used to realign 
reads and recalibrate base quality scores. Pre-processed BAM files were subjected to 
HaplotypeCaller of GATK for variant calling. The resulting SNPs were filtered by a set 
of filters including QD (quality by depth) <2.0, FS (Fisher strand bias test score) 
>60.0, MQ (root mean square of the mapping quality) <40.0, 
MappingQualityRankSum (mapping quality rank sum test score) <-12.5, 
ReadPosRankSum (read position rank sum test score) < — 8.0. Indels were filtered by 
QD < 2.0, ReadPosRankSum < -20 and FS > 200. VQSR (Variant Quality Score 
Recalibration) method of GATK was not applicable due to limited number of var- 
iants. Then the wANNOVAR server was used to identify and annotate exonic and 
intronic variants, determine if the variants had been observed in public databases, and 



give predictions on whether non-synonymous variants were predicted to be dele- 
terious based on multiple scoring systems. 

Results 

PCR results of six long-range PCR enzymes. We selected six long- 
range PCR enzymes for examination, each of which was advertised to 
be able to generate amplicons up to 15 kb or more (Table 3). We 
evaluated them on three amplicons with sizes of 12.9 kb, 9.7 kb, and 
5.8 kb and Tm values of 54°C, 63.3°C and 54.5°C, respectively 
(Table 1). In summary, we found that both PrimeSTAR and 
SequalPrep Polymerases can amplify all three targets. AccuPrime 
and LA Taq can only amplify the 12.9 kb and 5.8 kb targets. 
KAPA and QIAGEN long Range polymerase can amplify the 
5.8 kb target but not the two other larger ones (Figure 1). 
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Table 3 A list of enzymes for long-range PCR reaction. 








Amplicon size 






For GC-rich 


Proofreading 


Reaction time Price per 20 |xL 


Enzyme ID 


Product 


Company 


(kb) 


high-fidelity 


Hot-start 


templates 


activity 


(13 kb) 


reaction 


SegualPrep 


Q ID 1 


1 

Invitrogen 


1 u 


Yes 


1 es 






~7 h 


$1.6 




PCR Polymerase 


















Af i~i irrimo 
/^i_i_ui i i r i it; 


A f^-i i P r i mo T/-1/-1 

AACL.UI IIIIIC- IUU 


I n \/ 1 1 rr\n a n 

I I 1 V 1 1 1 WvJ CI 1 


20 


Yes 


Yes 




Yes 


~7 h 


$0.6 




lvin/a roiyrnerase 
















PrimpSTAR 

1 MM 1 CJ 1 n\\ 


PrimpSTAR GXI 

1 1 1 1 1 ICsJ 1 / \J/\L 


TnKnRn 

1 U IX\J l\ u 


>30 


Yes 


Yes 


Yes 




~5 h 


$0.4 




lmnm roiymerase 


















LA Taq 


1 A Ton Hnt Stnrt 


Tnl^nRri 

1 \J J\vJ l\LJ 


>15 




Yes 






~6 h 


$1.4 


Version 


















KAPA 


long Range HotStart 


KAPA 


20 


Yes 


Yes 




Yes 


~7h 


$0.5 




DNA polymerase 


















QIAGEN 


LongRange PCR 


QIAGEN 


40 


Yes 




Yes 


Yes 


-8.5 h 


$1.0 




polymerase 




















ACCUZYME DNA 


BIOLINE 


5 


Yes 






Yes 








Polymerase 




















KOD Hot Start DNA 


Novagen 


12 


Yes 


Yes 


Yes 


Yes 








Polymerase 

















The six enzymes require different PCR conditions to work prop- 
erly (Table 2). The PrimeSTAR enzyme can use a unified two-step 
PCR condition to amplify all three targets, making experimental 
design and implementation for PCR much easier in real-world set- 



tings, as one single thermocycler can be used to amplify all targets 
simultaneously. However, the SequalPrep needs to use amplicon- 
specific annealing temperature and extension time, which for the 
three amplicons were 55°C and 13 minutes, 60°C and lOminutes, 




Figure 1 | Gel electrophoresis of PCR products from the long-range PCR amplification by six enzymes. (A) and (B): SequalPrep (Three amplicons were 
amplified using amplicon-specific annealing temperature and extension time). (C): AccuPrime (Amplicons for Brcal.l and 2.8 with similar Tm values 
were amplified). (D): PrimeSTAR (Three amplicons were amplified using a unified two-step PCR condition.). (E): LA Taq (Amplicons for Brcal.l 
and 2.8 were amplified). (F): KAPA and QIAGEN (Only the amplicon for Brca2.8 was amplified). 



SCIENTIFIC REPORTS | 4:5737 | DOI: 1 0. 1 038/srep05737 



4 



Table 4 Primer- 


specific PCR conditions and results 
















SequalPrep 


A 

Accurrime 


rnmeo i mk 


LM 1 UC| 


k'APA 


QIAGEN 


A R 

M D 


0.1-10 kb 


>10 kb 


Sample 


1 2 3 112 3 


1 2 3 


1 2 3 


1 2 3 


1 2 3 


2 1 3 


1 2 3 


Annealing) C) 


60 60 60 55 65 


60 






60 


62 


62 


Extension(min) 


10 10 10 13 10 


13 


2-step 


12 


12 


10 


13 


Results 


+ + - + - 


+ - + 


+ + + 


+ - + 


+ 


+ 





65°C and 10 minutes, respectively. Both the LA Taq and AccuPrime 
can amplify 12.9 kb and 5.8 kb amplicons with similar Tm values 
using one PCR condition. The KAPA enzyme can amplify the 5.8 kb 
target, only after using the annealing temperature of 55°C with 13 
minutes extension time, which represents the "longer targets cycle 
conditions" in the user manual. When we used the "very long range" 
reaction mixture and cycle conditions for the QIAGEN enzyme, 
none of the three amplicons can be amplified; however, after using 
the "long range" PCR conditions, the shortest target (5.8 kb) can be 
amplified (Table 4). All experiments were repeated at least twice to 
confirm these findings. 

In addition to reaction time and tolerance to variation of cycling 
conditions, we were also interested in cost per reaction for these 
enzymes, for practical purposes of large-scale applications. 
Comparing the reaction time and price among these six enzymes, 
the PrimeSTAR polymerase stands out with 5 hours of PCR time and 
a cost of $0.4 per 20 ^iL reactions. Therefore, we chose the 
PrimeSTAR enzyme for long-range PCR for our NGS experiments 
below. 

Coverage and cost comparison of long-range PCR and custom 
capture arrays. To compare the target coverage of long-range PCR 
versus capture arrays, we used the Agilent SureDesign 15 and 
NimbleGen SeqCap EZ Designs 16 to design capture solutions for 
BRCA1/2. For Agilent solutions, we evaluated both SureSelect and 
HaloPlex. The designable coverage for exons is over 98% using all 
three designs. However, for exons and introns together, only 
HaloPlex design can achieve 96.6% coverage, yet SureSelect and 
SeqCap achieved coverage of 73.8% and 85.1%, respectively, 
suggesting reduced ability to cover intronic regions for these 
platforms. In comparison, the real-world performance of our long- 
range PCR method showed that it can get up to 100.00% coverage, 
even in a multiplex sequencing scenario where uneven sequencing 
depth exists across samples (Supplementary Table 1). 



As with cost, both the SureSelect and HaloPlex might be four times 
as expensive as the long-range PCR method for library preparation, 
according to the quotes for capture probes and other reagents. 
However, this does not take into account labour costs or equipment 
costs, and some methods are more labour-intensive and error-prone 
than others. Furthermore, long-range PCR method may have higher 
specificity and uniformity than conventional capture method, and 
therefore requires lower sequencing coverage to obtain high-quality 
data 17 . 

Targeted Amplification of BRCA1 and BRCA2 Genomic Regions. 

To evaluate the PrimeSTAR polymerase in NGS settings, we 
amplified the entire genomic regions of BRCA1 (chrl7:41 196312- 
41279500, GRCh37/hgl9) and BRCA2 (chrl3:32889617-32973809, 
GRCh37/hgl9). Initially we followed the primer sets reported by 
Ozcelik et al 8 , but a few amplicons cannot be amplified, despite 
multiple attempts to alter cycling conditions. Therefore, we re- 
designed some primer pairs, with the updated primers listed in 
Table 1. 

Nine DNA samples from peripheral blood of eight control subjects 
and one patient with hereditary breast cancer were used in our NGS 
experiments. Our goal is to evaluate if the experimental procedure 
can work consistently well among a group of samples and if a positive 
causal mutation can be identified reliably. For all samples, we were 
able to generate all the BRCA1/2 amplicons successfully, all of which 
display a single band with the expected size, without non-specific 
bands or smear (Figure 2). 

Sequencing the amplicons on MiSeq. We purified all amplicons, 
prepared sequencing libraries, and quantified the libraries using 
Qubit dsDNA BR Assay system (Invitrogen, Carlsbad, CA). Nine 
normalized libraries were pooled and sequenced together in one 
run on the Illumina MiSeq platform. Subsequently, we used BWA- 
MEM 13 to align the sequencing reads, GATK software tool to call 
variants 14 , and the wANNOVAR web server 11,18 to annotate variants 




Figure 2 | Successful results of 17 long-range PCR amplifications spanning the complete genomic region of BRCAl/2and their flanking sequences. 

Amplicons 1.1-1.9 cover BRCA1 and 2.1-2.8 cover BRCA2. 
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Figure 3 | Visual validation of mutations. (A): A mutation (c.5946delT:p.S1982fs in BRCA2) from the sample with hereditary breast cancer; (B): A 
mutation (c.4563A > G in BRCA2, synonymous SNV) from all samples; (C): A mutation (c.2972A > G:p.E991G) from all samples; (D): Two mutations 
(C.1941C > T:p.S647S and C.1936G > A:p.D646N) in BRCA1 from sample 2, 7 and 8. 



detected from the sequencing data. On average, each sample had 4.6 
million (range: 2.9-6.9) QC-passed reads, and 99.41% (range: 
97.55% to 99.82%) of them can be properly aligned and paired. For 
each sample, 70.99% (range: 45.61% to 85.64%) of the reads can be 
mapped to the designed target region. The average coverage on the 
target regions was 2261X (range: 1285X to 3583X), and 93.75% 
(range:81.55% to 100.00%) of the target region had coverage of 
over 10 and 98% (range: 92.53% to 100%) of the target region was 
covered at least once (Supplementary Table 1). 

We examined the variant calls generated on these nine samples. 
On average, we identified 234 SNVs per sample, with the vast major- 
ity being non-coding variants. Based on variant annotation from the 
wANNO VAR web server, these nine samples carried 4, 8, 3, 7, 4, 2, 7, 
7 and 6 non-synonymous SNVs, respectively. Additionally, we iden- 
tified a nonframeshift deletion from one control subject and a frame- 



shift deletion from the subject with hereditary breast cancer 
(Supplementary Table 2). This is a known disease causal mutation 
(c.5946delT in BRCA2) in the sample with hereditary breast cancer, 
and this mutation was verified by visualizing alignment on 
Integrative Genomics Viewer 19 (Figure 3A). There are several muta- 
tions of unknown significance in other samples (Figure 3B-D). All 
non- synonymous SNVs found in our samples are listed in 
Supplementary Table 3. 

One potential advantage of long-range PCR-based NGS might be 
that the sequence coverage is more likely to be even, given that the 
same amount of starting DNA material is available for all fragments 
from the same amplicon. We used Wiggle plot in SeqMonk 20 to view 
sequencing read depth of the three amplicons in BRCA1 and BRCA2 
that were used in the comparative analysis of six enzymes (Figure 4). 
The coverage plot demonstrated that significant variations of read 
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Figure 4 | Visualization of sequencing read depth in SeqMonk for three amplicons previously used for comparing enzymes. Left column shows 
amplicons BRCA1.1, BRCA1.6 and BRCA2.8 and their boundaries. Right column shows junction regions. Arrows indicate boundaries. Coverage is log2- 
transformed. The coverage plot demonstrates that significant variations of read depth may still exist within and between amplicons from long-range PCR. 
Loss of coverage may occur at junction of two amplicons (B), but can be recovered by larger overlapping of two amplicons (D, F). 



depth may still exist even for regions in the same amplicon from 
long-range PCR. Different amplicons (for example, BRCA1.1 and 
BRCA1.2) may also have different coverage, which may be improved 
by better sample normalization during library preparation. 
Additionally, we found that the relative sequencing depth for the 
same region tends to correlate across samples, suggesting that cov- 
erage correlates with certain sequence features such as GC content, 
repetitive sequence and Nextera restriction enzyme sites. At the rims 
of amplicons, coverage tends to be lower than the neighbouring 
region (e.g. BRCA1.1-BRCA1.2 junction). This loss of coverage can 
be recovered by larger overlapping between two amplicons 
(Figure 4D, 4F). Based on our observation, 1 kb overlapping of 
two amplicons seems to be sufficient (Figure 4F). In summary, 
long-range PCR is not immune to uneven sequence coverage typ- 
ically observed in NGS experiments for capture arrays. 

Discussion 

Long-range PCR has been commonly used to prepare specific high- 
molecular-weight DNA fragments for a variety of applications, 
including cloning, genome mapping and sequencing, and contig 
construction 21 . Generally speaking, to successfully amplify all ampli- 
cons in an experiment, one needs to change the annealing temper- 
ature and extension time, which are specific to each amplicon 
because the primers may have very different Tm values. In our 
experiment, we found that the TaKaRa PrimeSTAR GXL DNA 
Polymerase can amplify all amplicons of BRCA1/2 without altering 
experimental conditions, which we believe is an key advantage of 



using this enzyme when resources such as thermocycler is a limiting 
factor in research and clinical settings. 

In addition to long-range PCR, a variety of other methods, such as 
solution-based capture, microarray-based capture, molecular inver- 
sion probes (MIPs) and multiplex PCR, have been used in target 
enrichment applications. Target enrichment is a highly effective 
way of reducing costs and saving time when only specific genomic 
regions (such as all exons in a gene, or a genomic region spanning a 
few GWAS loci) are of interest. Approaches based on capture, such as 
solution-based capture and microarray-based capture, achieve high- 
performance and have advantages for medium to large target regions 
(10-50 Mb) 22 . However, the microarray-based methods, such as 
Agilent SureSelect and HaloPlex, require large amounts of input 
DNA to be successful as well as expensive hardware working with 
microarray slides 17,23 ; solution-based capture, such as NimbleGen 
SeqCap, is less extensively used because of performance issues 24 
but the solution-based capture techniques are constantly improving. 
Generally speaking, GC-rich segments were not well-represented in 
capture samples. This may be attributed to sequencing bias, as well as 
difficulty in capture for high GC template 23 . This is less a concern for 
the long-range PCR that "capture" large regions at once, especially 
for specific enzymes (such as PrimeSTAR GXL and QIAGEN 
LongRange PCR polymerase) that were optimized for amplifying 
GC-rich segments. MIPs are generally believed to be superior in 
terms of specificity, but far less amenable to multiple sample co- 
processing in a single reaction. Moreover, its design has to consider 
the uniqueness of each target region fragment and the most suitable 
hybridization conditions 22 . Long-range PCR has its unique niche, in 
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that it does not require customized design by commercial vendors, 
and can be afforded by small laboratories when a small number of 
samples and continuous regions (such as full gene region including 
introns) are of interest. 

Although mutations in the coding regions of BRCA1/2 have been 
heavily studied in previous genetic studies, potentially deleterious 
alterations may also reside in the less studied non-coding intronic 
sequences. For example, an insertion/deletion mutation in intron 24 
(3' UTR) of BRCA1 gene was found in one of the families with five 
breast cancer patients 25 . Additionally, a novel intronic mutation 
(IVS7 + 34_47delTTCTTTTCTTTTTT) and two unclassified intro- 
nic variations (IVS7 + 34_47delAAGAAAAGAAAAAA in the anti- 
sense strand and IVS7 + 50_63delTTCTTTTTTTTTTT in the sense 
strand) in BRCA1 were identified in a Thai family with a history of 
breast cancer 26 . Olgaet et al 27 reported that an intronic mutation 
(c.6937 + 594T > G) can activate a cryptic exon in BRCA2 that 
disrupts the coding sequence in breast cancer families. For these 
regions, to gain a more comprehensive understanding of the geno- 
type-phenotype relationships on BRCA1/2, it is necessary to examine 
both intronic and exonic regions. 

In this study, we compared 6 long-range DNA polymerases for 
amplification of three amplicons, with sizes of 12.9 kb, 9.7 kb, and 
5.8 kb, respectively, and found that the TaKaRa PrimeSTAR GXL 
DNA polymerase can amplify almost all amplicons with different 
sizes and Tm values under identical PCR conditions. We demon- 
strated that real-world performance for enzymes vary greatly 
between manufacturers, despite advertised performance character- 
istics, and how to couple long-range PCR with MiSeq sequencer 
which results in much faster turnaround time than previously pos- 
sible. Overall, this report provides a practical guide on how to use 
long-range PCR to perform NGS on large genomic regions, espe- 
cially when the entire gene regions including introns are of interest. 
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